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Probability  and  Statistics 
Applied  to  the  Theory  of  Algorithms 
J.  Michael  Steele 
Grant  Number  DAAL03-89-G-0092 

I.  Nature  of  the  Problems  Studied. 

The  central  aim  of  the  problems  studied  under  this  grant  is 
to  understand  when  and  how  probability  is  useful  in  the  theory  of 
algorithms.  Of  the  thirteen  articles  cited  below  which 
acknowledge  support  from  this  grant,  the  majority  are  of  the 
nature  where  one  uses  probability  to  study  a  problem  that  arises 
from  the  theory  of  combinatorial  optimization.  The  most  feuaous 
problems  of  Euclidean  combinatorial  optimization  are  perhaps  the 
Euclidean  traveling  salesman  problem,  the  minimal  spanning  tree 
problem,  and  the  minimal  matching  problem.  Probability  enters  the 
study  of  such  problems  in  several  ways,  but  one  of  the  most 
natural  and  direct  is  through  the  development  of  stochastic 
models  for  the  problem  inputs.  One  then  uses  probability  theory 
to  understand  as  deeply  as  possible  the  behavior  of  the 
associated  objective  functions.  All  but  a  few  of  the  articles 
reviewed  here  take  this  route;  but,  ironically,  there  are 
exceptions  that  turn  out  to  have  had  substantial  impact.  In 
particular,  it  has  proved  useful  to  pursue  the  analogy  between 
worst-case  and  average-case  behaviors.  Also,  there  are  two 
papers  that  are  best  viewed  as  addressing  targets  of  opportunity. 
Finally,  as  we  note  in  the  last  section  of  the  report,  work  has 
been  done  under  the  subsequent  Grant  DAAL-03-92-G-0110  that 


further  contributes  to  the  developments  reported  on  here. 


II.  Summary  of  Most  Important  Results 

An  Inverse  Problem  of  Computational  Statistics 

The  most  acknowledged  of  the  articles  that  were  supported  by 
this  grant  is  the  piece  with  Richard  D.  De  Veaux,  "ACE  guided 
transformation  method  for  the  estimation  of  the  coefficient  of 
soil  water  diffusivity" ,  Technometrics,  31,  (1989),  91-98.  This 
article  received  the  Wilcoxon  Prize  for  the  outstanding  article 
to  appear  in  Technometrics  in  1989.  This  article  is  not  in  the 
direct  line  of  most  of  the  work  done  under  this  grant.  Still, 
there  are  natural  connections,  even  though  the  piece  is  best 
understood  as  a  successful  response  to  a  target  of  opportunity. 

The  theme  of  the  article  with  De  Veaux  is  a  central  one  of 
applied  statistics  —  the  choice  of  a  transformation  of  data. 

The  main  twist  that  we  provide  is  to  detail  the  first  application 
of  a  systematic  automated  search  for  such  a  transfozmation  in  the 
context  of  an  inverse  problem,  which,  in  our  case,  comes  from  the 
problem  of  estimating  the  coefficients  in  the  basic  equation  of 
soil-water  diffusion. 

Several  favorable  circumstances  came  together  in  the 
article.  First,  there  is  considerable  benefit  in  beginning  with 
serious  scientific  problem;  although,  the  estimation  problem  may 
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sound  specialized,  there  are  hundreds  of  articles  on  soil-water 
diffusivity.  A  second  critical  contribution  to  the  project  was 
the  fortuitous  contact  at  Princeton  with  soil  scientists  who  were 
familiar  with  the  relevant  scientific  literature.  Finally,  the 
problem  offered  a  good  fit  with  earlier  work  since  it  required 
computational  expertise,  comfort  with  applications  of  PDF's,  and 
willingness  to  explore  a  large  number  of  alternatives  before 
settling  on  pragmatic  choices. 

At  a  minimum,  the  article  with  De  Veaux  showed  that  the  ACE 
algorithm  of  Breiman  and  Stone  is  a  genuinely  useful  tool  of 
computational  statistics.  It  also  made  progress  toward  showing 
that  data  analytic  thinking  which  one  most  often  sees  in  softer 
social  science  applications  can  make  a  contribution  to  a  hard 
science  topic,  like  the  estimation  technology  of  inverse 
problems.  Finally,  the  central  success  of  the  article  is  that  it 
provides  serious  candidate  for  the  method  of  choice  in  a  widely 
pursued  application  area. 

Subadditive  Euclidean  Functionals  and  Related  Work 

The  majority  of  the  work  done  under  this  grant  is  related  in 
one  way  or  another  to  the  technology  of  subadditive  processes. 

At  the  root  of  this  theory  is  Kingman's  subadditive  ergodic 
theorem,  and  the  paper  of  that  title  offers  one  of  the  shortest 
proofs  of  Kingman's  famous  result.  The  most  telling  aspect  of  my 


proof  is  that  it  is  explicitly  algorithmic.  Thus,  it  provides  a 
point  of  view  that  is  not  often  seen  in  the  more  theoretical 
parts  of  probability.  For  example,  Kingman's  original  proof  was 
in  the  style  of  a  Hahn-Banach  existence  argument,  even  though 
Kingman  pointed  out  a  useful  analogy  to  linear  programming  in  his 
original  article. 

The  pieces  "Efficacy  of  spacefilling  heuristics  in  Euclidean 
combinatorial  optimization"  and  "Cost  of  sequential  connection 
for  points  in  space"  engage  what  one  can  call  the  theory  of  "n 
points  in  the  unit  square."  The  first  of  these  uses  some  results 
from  the  theory  of  tube  volumes  (a  basic  subject  of  differential 
geometry)  to  study  the  heuristics  that  can  be  based  on 
spacefilling  ctirves.  The  second  paper  gives  a  simple  but 
powerful  bound  on  functionals  of  the  edge  lengths  of  the  path 
obtained  by 

sequentially  inserting  points  into  a  tour  of  points  in  the 
square.  These  results  are  technical  but  they  provide  tools  that 
add  to  the  effectiveness  of  the  theory  of  subadditive  Euclidean 
functionals,  a  basic  theme  of  this  grant. 

The  next  group  of  papers  that  deserve  review  are  those  that 
explore  the  behavior  of  the  objective  function  of  the  classical 
problems  of  geometric  optimization  under  the  so-called  "worst- 
case".  To  recall  a  typical  result  of  this  field,  we  let  T(S) 
denote  the  length  of  the  shortest  tour  through  the  points 
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.  .  .  ,^n^'=[0, 1] and  set 

p  in)  -max{T(S)  :  Isl-n) . 

The  fact  established  in  "Worst  Case  Grotrth  Rates  of  Some 
Classical  Problems  of  Combinatorial  Optimization"  is  that  p  (n) 
is  asymptotic  to  as  n  goes  to  infinity.  One  can  see 

without  difficulty  that  p  (n)  is  of  order  but  one  needs 

to  go  rather  more  deeply  into  the  structure  of  the  TSP  in  order 
to  get  an  exact  asymptotic  relationship. 

A  similar  theme  was  pursued  in  the  case  of  minimal  matching 
in  "Worst  Case  Matchings  in  the  Unit  Cube*.  The  form  of  the  main 
result  is  quite  close  to  that  for  the  TSP  (and  related 
functionals) ,  although  the  technical  demands  of  the  later  paper 
were  substantial.  The  central  difficulty  originates  in  the 
possibility  of  multiple  solutions  for  the  worst-case 
distributions  of  points. 

Exposition  and  New  Techniques 

There  are  three  articles  that  have  substantial  expositional 
coivtent  and  that  also  offer  some  research  progress:  (1) 
"Seedlings  in  the  Theory  of  Shortest  Paths",  (2)  "Probabilistic 
and  Worst  Case  Analyses  of  Classical  Problems  of  Combinatorial 
Optimization  in  Euclidean  Space",  and  (3)  "Probability  and 
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Statistics  in  the  Service  of  Computer  Science:  Illustrations 
Using  the  Assignment  Problem".  The  first  of  these  has  a  number 
of  partially  explored  ideas  (hence  seedlings) .  The  main  result 
of  the  article  is  a  new  martingale  proof  of  a  strong  tail  bound 
for  the  TSP.  The  technique  developed  there  has  been  used  in 
several  subsequent  works.  The  second  article  of  this  group  is 
the  broadest  survey  to  date  on  the  work  in  this  area,  and 
problems  posed  there  have  been  engaged  by  several  researchers 
including  M.  Talagrand,  W.S.  Rhee,  D.  Bertsimas,  and  K. 

Alexander.  The  third  article  of  the  group  is  the  most 
expository,  but  has  been  also  followed  up  by  several  researchers 
including  David  Aldous. 

Semi-matchings  and  Connection  to  Linear  Constraints 

With  "Euclidean  Semi-Hatchings  of  Random  Samples",  we  return 
to  the  central  theme  of  the  grant.  The  main  result  of  that 
article  is  a  probabilistic  limit  theorem  that  recalls  the 
Beardwood-Halton-Hammersley  Theorem  in  the  context  of  matching. 
The  most  innovative  aspect  of  the  article  is  that  it  connects  the 
theory  of  subadditive  Euclidean  functionals  to  the  theory  of 
linear  programming.  This  is  a  powerful  connection  that  has  not 
yet  been  fully  explored. 
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Two  Articles  that  are  a  Atypical 

Two  articles  that  should  be  singled  out  for  their  atypical 
nature  are  "Certifying  smoothness  of  discrete  functions  and 
measuring  legitimacy  of  images,"  Journal  of  Complexity,  5, 

(1989),  261-270,  and  "Models  for  managing  secrets,"  Management 
Science,  35,  (1989)  ,  240-248.  The  first  of  these  is  motivated  by 
one  of  the  simplest  questions  that  can  be  posed  in  the  important 
area  of  automatic  target  recognition  (ATR) ,  and  the  second 
explores  some  very  simple  probability  models  that  aim  to  provide 
insight  into  the  processes  by  which  secrets  can  be  kept,  or 
disclosed. 

The  initial  motivation  for  "Models  for  managing  secrets" 
came  from  a  line  in  the  novel.  The  Hunt  for  Red  October,  where  a 
Naval  commander  said  "The  likelihood  of  a  secret  being  blown  is 
proportional  to  the  square  of  the  number  of  people  who  are  in  on 
it."  The  aim  of  the  article  was  to  examine  the  possibility  of 
credible  probability  models  in  which  the  commander's  intuition 
corresponded  to  an  analytical  fact.  The  article  received  quite 
positive  reviews  and  was  quickly  published  in  Management  Science. 
Invited  presentations  were  given  on  the  article  at  several 
universities  and  the  Institute  for  Defense  Analysis. 
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